Olympic Data

1 Import the data

  NOC Year Decade     ID First.Name                   Name Last.Name Sex Age
1 AFG 1960  1960s  59346   Mohammad   Mohammad Asif Khokan    Khokan   M  24
2 AFG 1960  1960s  59043       Faiz Faiz Mohammad Khakshar  Khakshar   M  18
3 AFG 1960  1960s 109486      Abdul     Abdul Hadi Shekaib   Shekaib   M  20
  Height Weight      BMI BMI.Category        Team Population       GDP    GDPpC
1    171     78 26.67487            3 Afghanistan    8996973 537777800 59.77319
2    162     52 19.81405            0 Afghanistan    8996973 537777800 59.77319
3    178     68 21.46194            2 Afghanistan    8996973 537777800 59.77319
        Games Season City     Sport                                   Event
1 1960 Summer Summer Roma Wrestling Wrestling Men's Middleweight, Freestyle
2 1960 Summer Summer Roma Wrestling    Wrestling Men's Flyweight, Freestyle
3 1960 Summer Summer Roma Athletics              Athletics Men's 100 metres
     Medal Medal.No.Yes
1 No Medal            0
2 No Medal            0
3 No Medal            0
 [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
'data.frame':   151977 obs. of  24 variables:
 $ NOC         : Factor w/ 122 levels "AFG","ALB","AND",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Year        : int  1960 1960 1960 1960 1960 1960 1960 1960 1960 1960 ...
 $ Decade      : Factor w/ 6 levels "1960s","1970s",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ ID          : int  59346 59043 109486 59102 128736 29626 39922 106372 128736 58364 ...
 $ First.Name  : Factor w/ 14118 levels "","A","A.","Aadam",..: 8716 3731 64 599 64 11978 64 4634 64 8716 ...
 $ Name        : Factor w/ 74268 levels "  Gabrielle Marie \"Gabby\" Adcock (White-)",..: 48941 19066 218 3341 220 64832 215 23793 220 48946 ...
 $ Last.Name   : Factor w/ 47370 levels "","-)","-Alard)",..: 23228 23112 38893 23137 44908 13260 16633 37860 44908 22890 ...
 $ Sex         : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
 $ Age         : int  24 18 20 35 20 28 22 23 20 20 ...
 $ Height      : int  171 162 178 166 179 168 172 170 179 166 ...
 $ Weight      : num  78 52 68 66 75 73 70 58 75 62 ...
 $ BMI         : num  26.7 19.8 21.5 24 23.4 ...
 $ BMI.Category: Factor w/ 5 levels "0","1","2","3",..: 4 1 3 3 3 4 3 3 3 3 ...
 $ Team        : Factor w/ 332 levels "Acipactli","Afghanistan",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ Population  : int  8996973 8996973 8996973 8996973 8996973 8996973 8996973 8996973 8996973 8996973 ...
 $ GDP         : num  5.38e+08 5.38e+08 5.38e+08 5.38e+08 5.38e+08 ...
 $ GDPpC       : num  59.8 59.8 59.8 59.8 59.8 ...
 $ Games       : Factor w/ 30 levels "1960 Summer",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Season      : Factor w/ 2 levels "Summer","Winter": 1 1 1 1 1 1 1 1 1 1 ...
 $ City        : Factor w/ 29 levels "Albertville",..: 19 19 19 19 19 19 19 19 19 19 ...
 $ Sport       : Factor w/ 51 levels "Alpine Skiing",..: 51 51 3 51 3 51 3 3 3 51 ...
 $ Event       : Factor w/ 489 levels "Alpine Skiing Men's Combined",..: 478 468 17 476 33 482 22 24 18 466 ...
 $ Medal       : Factor w/ 4 levels "Bronze","Gold",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ Medal.No.Yes: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...

There is clearly an upward trend, but no seasonal pattern. The data is also a little choppy at the beginning. Part of the explanation is that the data points are not evenly spaced. Most Olympic games are 4 years apart, but a few of them are just 2 years apart, and during World War I and World War II there were 8-year and 12-year gaps, respectively. Since time series data should be evenly spaced over time, we’ll only look at data from 1948 on, when the Olympics started being held every 4 years without any interruptions.

2 Creating models

I’m going to try 4 different models.

\[ y_{\text{linear}}(x) = ax+b \\ y_{\text{quadratic}}(x) = ax^2 + bx + c \\ y_{\text{exponential}}(x) = a\exp(bx) + c \\ y_{\text{cubic}}(x) = ax^3 + bx^2 + cx + d \]

And I’ll be able to use ANOVA to test the nested models: linear vs quadratic, and exponential growth vs s-curve (sigmoid).

Now I will try the model fits on the number of events per Olympic Games data.

Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
17 33.33158 NA NA NA NA
16 30.41244 1 2.919136 1.535759 0.2331213
16 32.28073 0 0.000000 NA NA
15 29.04971 1 3.231026 1.668361 0.2160269

3 Sports

  Year Mean_Weight StdDev_Weight Mean_Height StdDev_Height    Sport Sex
1 1924    64.00000      0.000000    167.0000      0.000000 Swimming   F
2 1956    61.00000      4.780914    169.7333      3.634491 Swimming   F
3 1960    62.73469      5.619073    169.3469      6.839076 Swimming   F
4 1964    63.06000      6.466270    171.3600      4.378799 Swimming   F
5 1968    62.45455      5.361348    170.3636      4.583033 Swimming   F
6 1972    60.23611      5.491333    170.3889      4.949194 Swimming   F
'data.frame':   339 obs. of  7 variables:
 $ Year         : int  1924 1956 1960 1964 1968 1972 1976 1980 1984 1988 ...
 $ Mean_Weight  : num  64 61 62.7 63.1 62.5 ...
 $ StdDev_Weight: num  0 4.78 5.62 6.47 5.36 ...
 $ Mean_Height  : num  167 170 169 171 170 ...
 $ StdDev_Height: num  0 3.63 6.84 4.38 4.58 ...
 $ Sport        : Factor w/ 10 levels "Basketball","Canoeing",..: 9 9 9 9 9 9 9 9 9 9 ...
 $ Sex          : Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...

Medal mean
Bronze 25.55859
Gold 25.28269
No Medal 24.93049
Silver 25.48383
# A tibble: 6 x 3
# Groups:   Year [3]
   Year Sex   mean.Age
  <int> <fct>    <dbl>
1  1960 F         21.6
2  1960 M         26.0
3  1964 F         21.5
4  1964 M         25.7
5  1968 F         20.5
6  1968 M         25.1

3.1 Swimming

3.1.1 Female Athletes

Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
13 8.5085546 NA NA NA NA
12 3.3398817 1 5.168673 18.57074 0.0010150
12 8.1614545 0 0.000000 NA NA
11 0.6470143 1 7.514440 127.75427 0.0000002
Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
13 11.697173 NA NA NA NA
12 5.842805 1 5.854368 12.02375 0.0046521
12 11.347295 0 0.000000 NA NA
11 2.432617 1 8.914677 40.31109 0.0000545
Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
13 4.326164 NA NA NA NA
12 4.163567 1 0.162597 0.4686279 0.5066258
12 4.408289 0 0.000000 NA NA
11 2.732882 1 1.675407 6.7436063 0.0248333

3.1.2 Male Athletes

Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
13 3.7207136 NA NA NA NA
12 1.8313700 1 1.889344 12.37987 0.0042351
12 3.5597205 0 0.000000 NA NA
11 0.4074947 1 3.152226 85.09186 0.0000016
Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
13 13.008921 NA NA NA NA
12 12.927824 1 0.0810975 0.0752771 0.7884689
12 12.958849 0 0.0000000 NA NA
11 2.535541 1 10.4233087 45.2197021 0.0000327
Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
13 10.265883 NA NA NA NA
12 4.739344 1 5.526539 13.99317 0.0028181
12 10.762827 0 0.000000 NA NA
11 3.544380 1 7.218447 22.40248 0.0006162

Izzy Illari

20 April, 2020